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The process of admitting new students at Universitas Islam Negeri Raden 
Fatah each year produces a lot of new student data. So that there is an 
accumulation of student data continuously. The purpose of this study is to 
compare deep learning, naive bayes, and random forest on the admission of 
new students as well as being one of the bases for making decisions to 
determine the promotion strategy of each study program. The data mining 
method used is knowledge discovery in database (KDD). The tools used are 
rapid miner. The attributes used are student ID number, name, program 
study, faculty, gender, place of birth, date of birth, year of entry, school 
origin, national examination, type of payment, and nominal payment. The 
new student data used from 2016 to 2019 was an 18.930 item. The results of 
this study used deep learning bayes results resulted in an accuracy value of 
52.65%, naive bayes results resulted in an accuracy value of 99.79%, and 


Random forest random forest results resulted in an accuracy value of 44.65%. 
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1. INTRODUCTION 

Information technology has an important role in most organization that manipulates and collects 
data in large databases. Decision making is generated by useful information from storing data. Data storage 
patterns that have been discovered by users to help the data analysis process is called data mining [1]. Along 
with the development of the internet, the data stored, both in the form of text, images, sound, and video also 
increased very quickly and significantly. In Indonesia, internet users in 1998 were only 500,000 users 
whereas by 2015 it was projected that internet users had reached 139 million [2]. information can identified 
by characteristics [3]. The large volume of data volume will become "garbage" in storage if it is not 
processed into useful information. Data mining technology provides a user-oriented approach to novel and 
hidden patterns in the data [4]. Mixed data models that have many topics can form a text data set model [5]. 
This is consistent with the definition of data that data is a fact that is recorded but has no meaning. Many 
universities have used information technology to support the admission process [6]. The application of 
information technology to education can also produce abundant student data and learning processes [7]. At 
universities, data can be obtained from databases, data will continue to grow, such as student data. Hope after 
this data mining technique can be used and useful and help analyze data in higher education institutions. 
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The process of admitting new students at Universitas Islam Negeri Raden Fatah every year produces 
a lot of new student data. This happens continuously so that there is an accumulation of student data which 
will continuously increase in the search for student information. Based on the amount of new student data, by 
managing the data, information that can be seen can be done by the University. Based on the number of new 
student data, by organizing the data so that information can be accessed and accepted by the university, for 
example, a compilation of university promotions or outreach and study programs in schools to accept new 
students, universities access schools for promotion. This causes a waste of budget because too many schools 
will be visited, and not time efficient. This research will classify and clarify data on the admission of new 
students at Universitas Islam Negeri Raden Fatah by utilizing the data mining process by applying clarification 
techniques. By comparing the three algorithms, deep learning, naive bayes, and random forest. The tools used are 
rapid miner. The attributes used are student ID number, name, program study, faculty, place of birth, gender, date 
of birth, school origin, year of entry, national examination, type of payment, and nominal payment. Based on the 
results of deep learning, naive bayes, and random forest can determine the promotion strategy of each study 
program. Based on the results of deep learning, naive bayes, and random forest can see courses of interest in each 
school. The final results of the cluster can help the University [8]. 

Data mining concept is to extract hidden patterns and to discover relationships between parameters 
in a vast amount of data [9]. Data mining is the process of extracting data (previously unknown, implicit, and 
considered useless) into information or knowledge or patterns from large amounts of data [10]. Data that is 
considered "garbage" because it is not patterned/not structured and is not useful, is processed (filter) so that it 
forms information or knowledge or new patterns that are useful [9]. Data mining is a process to explore the 
excess value of information that was not previously known to exist in a database. The patterns that are very 
useful and have more value than the data contained in the database are obtained by recognizing the 
information obtained [11]. From the explanation above it can be concluded that data mining is a step of 
analyzing the process of knowledge discovery in the database. Data mining is a process that employs one or 
more machine learning techniques (machine learning) to analyze and extract knowledge automatically [12]. 

A method based on learning from a feature that is not noticed is called deep learning [13]. Naive 
bayes algorithm is one of the clarification algorithms based on the bayesian theorem in statistics. The 
availability of a class can be predicted by the naive bayes algorithm [14]. The naive bayes method is the 
beginning to build a method that has been designed using a corpus that has been formed [15]. Naive bayes is 
often called the bayes' rule which is a prefix for data mining methods and machine learning. It builds a model 
with predictions. This is a new way to find out data and learn more [16]. Modeling data obtained by working 
with binary data, and is a category of data called random forest [17]. Clarification techniques on naive bayes 
can be used at very large input dimensions. This is a simple algorithm but can produce very good results than 
other algorithms [18]. 


2. RESEARCH METHOD 

The field of study that focuses on a methodology to add knowledge that is very much useful from 
the data is called knowledge discovery in database (KDD). The rapid development of online data on an 
ongoing basis due to the widespread use of the internet and databases has made a very large need from the 
KDD methodology [19]. Obstacles to adding knowledge from data to database research, machine learning, 
knowing patterns, statistics, and increasing performance to become smart and sophisticated business 
solutions [20]. In this study, the method used for data processing is the admission data by using the stages of 
knowledge discovery in database (KDD) as illustrated in Figure 1. Knowledge discovery in database (KDD) 
is the process of determining useful information and patterns in data. This information is contained in a large 
database that was previously unknown and potentially useful. Data mining is one step in a series of KDD 
iterative processes [21]. 


2.1. Data selection 

In this process the selection of data sets is done, creating a target data set, or focusing on a subset of 
variables (data samples) where the discovery will be performed [22]. The results of the selection are stored in 
a separate file from the operational database. The attributes used are student ID number, name, program 
study, faculty, gender, school origin, year of entry, national examination, type of payment, and nominal 
payment. The data in this study were sourced from Universitas Islam Negeri Raden Fatah where this data is 
secondary data consisting of new student data for 2016 up to 2019. The amount of data obtained was 18,930 
consisting of student ID number, name, program study, faculty, place of birth, gender, date of birth, school 
origin, year of entry, national examination, type of payment, and nominal payment. The following are some 
examples of new students data Table |. The stages of the knowledge discovery in database (KDD) process 
consist of: 
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Interpretation / 
Evaluation 


Transformation 


Patterns 


Transformed 


Preprocessed Data Data 


Target Date 


Figure 1. Stages in KDD 


Table 1. New student data obtained 


Student ID Yea of School National Type of Nominal 
mp Number Nate piidy proenin Geile Entry Origin Examination Payment Payment 
Islamic 
Siska ; F school Rp. 
1 1683600004 Apriyanti Hadith Science F 2016 Nurul 77 Group 3 1.800.000 
Hikmah 
Islamic 
Sifaul Quranic Sciences school Rp. 
? Hee Hasanah and Interpretation M 20n6 Nurul oy Choups 1.800.000 
Hikmah 
Islamic 
Maulidin Quranic Sciences school Rp. 
: een Anam and Interpretation M ae Nurul e Croup 1.800.000 
Hikmah 
Aqeedah and Senior High 
Bayu f Rp. 
4 1683400008 Islamic F 2016 School 85 Group 3 
Putra : : 1.800.000 
Philosophy Puspita 


2.2. Pre-processing and cleaning 
Data pre-processing and data cleaning are done by removing inconsistent data and noise, duplicating 
data, correcting data errors, and can be enriched with relevant external data [23]. 


2.3. Transformation 
This process transforms or combines data into a more appropriate way to do the mining process by 
summarizing (aggregation). 


2.4. Data mining 
A cycle to obtain a pattern or information that is very interested in data and used by a technique [24], 
methods, or algorithms under the objectives of the KDD process is called data mining process [10]. 


2.5. Interpretation/evaluation 

The process for translating patterns generated from data mining. Evaluate (test) whether the patterns 
or information found are by or contradictory to previous facts or hypotheses. Knowledge obtained from the 
patterns formed is presented in the form of visualization. 


3. RESULTS AND ANALYSIS 
3.1. Deep learning 

The data processing of new students using deep learning with rapidminer software is shown in 
Figure 2. Detail of the validation process in deep learning as shown in Figure 3. Using deep learning 
modeling as shown in Figure 2, with the amount of training data (new student admission data from 2016 to 
2019) receiving 18.930 items. The accuracy of using deep learning is 52.65% as shown in Figure 4. Besides 
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producing an accuracy value, deep learning also produces a kappa value of 0.511, a correlation value of 
0.804, and a cross-entropy value of 1.793 as shown in Figure 5. 


Read Excel Validation 


(j tra % med P 
tra) 
b 


Figure 2. Deep learning modeling on rapidminer 
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Figure 3. Validation deep learning 


accuracy: 52.65% 
true S11 true S11 true STA tueS1B.. tueSiP.. tueSte tueSiK.. trueSiP.. tuesiP 

pred. $1 7 

pred. $1 

pred. S1 

pred. S1 

pred. S1 

pred. S1 

pred. $1 

pred. $1 


pred. S1 


pred. S1 


Figure 4. View accuracy deep learning 


3.2. Naive bayes 

The data processing of new students using naive bayes with rapidminer software is shown in 
Figure 6. Using naive bayes modeling as shown in Figure 6, with the amount of training data (new student 
admission data from 2016 to 2019) receiving 18.930 items and testing data using 2019 new student admission 
data with a total of 4762 items. The accuracy of using naive bayes is 99.79% as shown in Figure 7. Besides 
producing an accuracy value, naive bayes also produces a kappa value of 0.998, a correlation value of 0.998, 
and a cross-entropy value of 0.029 as shown in Figure 8. 
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3.3. Random forest 

The data processing of new students using random forest with rapidminer software is shown in 
Figure 9. Using random forest modeling as shown in Figure 9, with the amount of training data (new student 
admission data from 2016 to 2019 receiving 18,930 and testing data using 2019 new student admission data 
with a total of 4762. The accuracy of using random forest 44.65% as shown in Figure 10. 


kappa: 0.535 
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Figure 5. View kappa deep learning 
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Figure 7. View accuracy naive bayes 
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Figure 8. View kappa naive bayes 
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Figure 9. Random forest modeling on rapidminer 
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Figure 10. View accuracy random forest 


Besides producing an accuracy value, random forest also produces a kappa value of 0.421, a 
correlation value of 0.729, and a cross-entropy value of 2.003, as shown in Figure 11. The parameter results 
of accuracy, precision, recall can be said to produce good classification results or not by using the 
classification result parameter guidelines shown in Table 2 [25]. The results of the comparison between the 
three algorithms namely deep learning, naive bayes, and random forest, as shown in Table 2. 

The highest accuracy value is naive bayes with a value of 99.79%, while for the highest kappa value 
is naive bayes with a value of 0.998, while for the highest correlation value is naive bayes with a value of 
0.998, and for the highest cross-entropy value that is random forest with a value of 2.003. 
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Figure 11. View accuracy random forest 


Table 2. Results comparison of all three algorithms 


Result The algorithm 
Deep learning Naive bayes _ Random forest 


Accuracy 52.65% 99.79% 44.65% 
Kappa 0.511 0.998 0.421 
Correlation 0.804 0.998 0.729 
Cross-entropy 1.793 0.029 2.003 


4. CONCLUSION 

Based on the research and discussion that has been carried out, it can be concluded that from the 
three methods of deep learning, naive bayes, and random forest in determining the best student recruitment 
promotion strategy at the Raden Fatah State Islamic University in Palembang and referring to the original 
data. Data of new students used from 2016 to 2019 were 18.930 items as data training and data testing used 
data of new students from 2019 were 4762 items. The results of this study used deep learning results resulted 
in an accuracy value of 52.65%, naive bayes results resulted in an accuracy value of 99.79%, and random 
forest results resulted in an accuracy value of 44.65%. So of the three algorithms that show the best results 
for the promotion strategy of incoming new students that are using naive bayes with the highest accuracy 
value of 99.79%. 
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